National Association of Psychometrists 2025 Annual Meeting
2025-06-13
Describe the general goals of data analysis and statistics in both research and applied settings
Understand the core concepts and vocabulary used in machine learning/AI techniques
Compare and contrast the use-cases for “classic” statistical models versus those better addressed by machine learning
Survey the current state of research with these methods, specifically related to the context of neuropsychological assessment
Critically evaluate the practicality of using these methods in clinics, as well as the procedural, ethical, and statistical problems associated with the methods
Born out of inspiration and frustration; neuropsychology has been credibly accused of “falling behind” (Miller & Barr, 2017; Singh & Germine, 2021)
Public and corporate pressure to adopt new and flashy technology, many already see adoption of AI by providers as a goal (Hou et al., 2024)
Omnipresence of AI- and machine learning-based tools, and proliferation of AI-focused academic research, though not always really meshing with non-AI focused research (Duede et al., 2024)
Reflecting on my own role as both educator and consumer
Sharing my perspective of a cautious/skeptical optimist
“Any sufficiently advanced technology is indistinguishable from magic.” - Arthur C. Clark
Today, I will primarily focus on methods described as machine learning and “AI” and use cases in neuropsychological evaluation, specifically towards prediction of patient cognition from psychological test scores.
Review some prominent and recent findings from this booming area of research
New benefits with these new techniques (which is exciting), but new challenges as well…
Disclaimer: Some cited articles are pre-prints and have not yet undergone peer-review, please review cited studies before using for decision-making
Evidence-based assessment
Build rapport, trust, and education in patients
Improve validity and reliability of tests and results
Public and academic perception of neuropsychology as valuable patient service
Refinement of accuracy of diagnosis in assessment
…statistical methods have a long-standing focus on inference, which is achieved through the creation and fitting of a project-specific probability model… By contrast, ML [machine learning] concentrates on prediction by using general-purpose learning algorithms to find patterns in often rich and unwieldy data” (Bzdok et al., 2018)
Techniques such as z-test, t-test, ANOVA, Pearson’s R, OLS linear regression and many others
Result in p-values, beta coefficients, effect sizes, etc.
Value comes from estimating probabilities and magnitude of outcomes, resulting from different variables
More explainable, tangible results
Hypothetical Example
| Variable | Beta | P-value |
|---|---|---|
| Age | -0.39 | 0.04 |
| Education | 1.94 | 0.001 |
| Gender (Male Dummy Code) | 0.03 | 0.39 |
Evidence of (convergent/concurrent) validity: correlation of FSIQ between WAIS-IV and WAIS-5 - r = 0.92, statistically significant (Wechsler, 2025, p. 85)
Quantifying average differences between different types of individuals: WMS-IV LM I comparison between MCI (M = 8.4, SD = 2.7) and matched control group (M = 11.4, SD = 3.0) – t = 4.93, p < 0.01. (Wechsler, 2009, p. 113)
Developing appropriate normative data to compare our patients against and establish relative cognitive ability - examples from (Mitrushina et al., 2005, p. 649):
Involves some type of computational “learning” -> model iterates through available data and “trains” in some manner
Has existed for a long time, but has scaled up greatly with exponential rise in technology
Central goal of improving how accurate we can predict an outcome of interest
De-prioritizes giving exact, interpretable values to variables
Is especially well suited to dealing with MANY different variables at one time
Numerous techniques in this broad family - e.g. Classification and Regression Trees (CART), Random Forest (RF), Gradient-boosted Models (GBM), etc.
Example - CART Model (Lavery et al., 2007)
Screener test (MMSE) score used as a predictor, alongside other cognitive tests, with dementia as outcome
MMSE scores are given binary splits (by the model) that produce distinct classification odds of dementia
Artificial intelligence, as we know it today, is largely a marketing term encompassing extremely powerful and computationally expensive ML models (Toh et al., 2019)
There are some literature reviews that have tried to distinguish the terms (Kühl et al., 2022); but underlying mathematical methods are still the same as ML.
Still focused on stellar prediction via especially large datasets and high computation power
“Chat” services and note-writing assistance tools are using a framework called a “large language models” (LLMs) and are already finding use in medical settings (Thirunavukarasu et al., 2023).
LLMs involves predicting appropriate human-like language (outcome) in response to prompts, context, and training data (predictors).
| Criteria | Classic Inferential Models | ML/AI Models |
|---|---|---|
| Number of “trials” | Fits one set of parameters for predictor/grouping variables (static) | Uses multiple iterations to improve parameters by some strategy or algorithm (dynamic) |
| Primary focus or goal | Inferring likelihood and magnitude of variable effects on an outcome | Accurate prediction of outcome, reducing amount of error |
| Explain-ability of results | Reasonably able to interpret variables and the size of their contributions to the outcome | Depending on the model, can be relatively difficult to understand how individual variables predict outcome |
| Difficulty of computation | Low technology demand, mostly calculatable by hand | Depending on the model, extremely taxing, large data and technological demand |
| Number of variables | Less variables possible for certain tests (e.g., t-test), even multiple regression models restrict total number | Many more predictor variables, predictive power often scales with additional relevant data |
ML/AI can extract insights from complex test data
Especially useful with large, multivariate sets
Most common in visuospatial testing so far
Can inform test selection and diagnostic decisions
ML and “AI” tools may be costly to develop, run, or subscribe to (in the case of commercial software products) and thus, may not be feasible to run on in-clinic hardware or purchase within budget constraints (Crawford, 2021).
Clinicians and psychometrists need effective training in navigating and understanding these models to use and apply them (Hedderich et al., 2021)
Depending on the setup, it may be necessary to consider how patient data privacy and confidentiality may be at risk when interacting with a model (Chen & Esmaeilzadeh, 2024)
Though some healthcare professionals may be excited and have positive feelings towards AI (Catalina et al., 2023), others may be more wary and object to its use and the impact in may have on the workforce and staffing.
Providers may feel concerned in fully trusting these models in being congruent with their own judgments (Lebovitz et al., 2022), which is especially concerning given the high stakes associated with diagnostic accuracy in neuropsychological evaluations.
Patient may feel less trust in total or partial reliance in “AI” tools in medical care, compared to diagnostics only performed by providers (Clements et al., 2022)
Unlike inferential statistical models, some ML and AI models deal with the “black box” problem of not explaining the why of how predictions are made - though work is being done to try to resolve this issue (Poon & Sung, 2021)
AI may “hallucinate”, i.e., predict results completely incorrect or fundamentally flawed – as it is effectively “guessing” or estimating a correct response (Alkaissi & McFarlane, 2023)
ML and AI are reliant upon their training data and are thus biased towards patterns existing in that data. When confronted with a case unlike those in the training data, the model is liable to be biased towards what it already “knows” (Barocas et al., 2023)
AI and machine learning is something to be excited about!… with caveats.
Where there are many exciting developments in computational techniques for analysis in neuropsychology, there are still many considerations to balance and consider in applying these ideas
Return to why we use statistics and quantitative analysis in practice, does use of these new technologies support and enhance these?:
Psychometrists should be deliberate in building up a sufficient knowledge of these technological advantages to weigh how to implement them in practice
Presentations
Academic Articles